b" Contents/Abbreviations\n DEPARTMENT        OF HOMELAND SECURITY\n\n      Office of Inspector General\n\n\n\n\n      Survey of DHS Data Mining Activities\n\n\n\n\n      Office of Information Technology\n            DHS\xe2\x80\x99 Data Mining and Advanced Analytics Efforts Could Be Optimized\nOIG-06-56                                                                        August 2006\n\x0c                                                                          Office of Inspector General\n\n                                                                          U.S. Department of Homeland Security\n                                                                          Washington, DC 20528\n\n\n\n\n                                          August 15, 2006\n\n\n                                               Preface\n\nThe Department of Homeland Security (DHS) Office of Inspector General (OIG) was established\nby the Homeland Security Act of 2002 (Public Law 107-296) by amendment to the Inspector\nGeneral Act of 1978. This is one of a series of audit, inspection, and special reports prepared as\npart of our oversight responsibilities to promote economy, effectiveness, and efficiency within\nthe Department.\n\nThis report identifies and describes a selection of the Department\xe2\x80\x99s data mining and advanced\nanalytics that contribute toward counterterrorism efforts. It is based on direct observations,\nreview of applicable documents, and interviews with Department officials, program managers,\nand technical staff.\n\nIt is our hope that this report will result in more effective, efficient, and economical operations.\nWe express our appreciation to all of those who contributed to the preparation of this report.\n\n\n\n\n                                               Richard L. Skinner\n                                               Inspector General\n\x0cExecutive Summary ...............................................................................................................................4\n\nBackground ............................................................................................................................................4\n\nResults of Survey ...................................................................................................................................6\n\n     Description of Identified Data Mining Activities ............................................................................6\n\nManagement Comments ......................................................................................................................17\n\nAppendices\n     Appendix A:             Purpose, Scope, and Methodology .......................................................................18\n     Appendix B:             Major Contributors to this Report ........................................................................19\n     Appendix C:             Report Distribution...............................................................................................20\n\nAbbreviations\n     ACE S1   Automated Commercial Environment Screening and Targeting Release S1\n     ADVISE   Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement\n     ATS      Automated Targeting System\n     CBP      United States Customs and Border Protection\n     CIO      Chief Information Officer\n     CIS      United States Citizenship and Immigration Services\n     CVS      Crew Vetting System\n     DARTTS   Data Analysis and Research for Trade Transparency System\n     DHS      Department of Homeland Security\n     DOJ      U.S. Department of Justice\n     FAS      Freight Assessment System\n     FEMA     Federal Emergency Management Agency\n     FLETC    Federal Law Enforcement Training Center\n     I2F      Intelligence and Information Fusion\n     ICE      Immigration and Customs Enforcement\n     ICEPIC   Immigration and Customs Enforcement Pattern Analysis and Information\n              Collection System\n     IT       Information Technology\n     NETLEADS Law Enforcement Analysis Data System\n     NIPS     Numerical Integrated Processing System\n     OCIO     Office of the Chief Information Officer\n     OIA      Office of Intelligence and Analysis\n     OIG      Office of Inspector General\n     OLAP     On-Line Analytical Processing\n     QID      Questioned Identification Documents\n\n\n                                                   Survey of DHS Data Mining Activities\n                                                                 Page 2\n\x0c  RMRS       Risk Management Reporting System\n  S&T        Science and Technology\n  TISS       Tactical Information Sharing System\n  TSA        Transportation Security Administration\n  USSS       United States Secret Service\n  US VISIT   United States Visitor and Immigrant Status Indicator Technology\n  VISAT      Vulnerability Identification Self-Assessment Tool\n\nTables\n  Table 1    Common Data Mining Uses ...................................................................................6\n  Table 2    Expert Systems .......................................................................................................8\n  Table 3    Association Processes...........................................................................................10\n  Table 4    Threat and Risk Assessment Tools ......................................................................12\n  Table 5    Collaboration and Visualization Processes ..........................................................14\n  Table 6    Advanced Analytics .........................................\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa6\xe2\x80\xa617\n\n\n\n\n                                 Survey of DHS Data Mining Activities\n                                               Page 3\n\x0cExecutive Summary\n                           We surveyed the Department of Homeland Security (DHS) to identify and\n                           describe data mining activities used to support the counterterrorism\n                           mission. Data mining and advanced analytics are evolving technologies\n                           that assist in the discovery of patterns and relationships from vast\n                           quantities of data. Data mining employs techniques from statistics,\n                           machine learning, database management, and visualization. These\n                           techniques aid the work of analysts, agents, and investigators and provide\n                           knowledge in a manner that aids and informs decision-makers. While\n                           various definitions of data mining exist, for the purpose of our survey we\n                           defined data mining in a manner to broadly illustrate the range of\n                           applications and tools that the Department uses to assist DHS personnel\n                           with knowledge discovery, predictive modeling, and analytics.\n\n                           We identified 12 systems and capabilities that DHS personnel use to\n                           perform data mining activities to support DHS\xe2\x80\x99 mission of counter-\n                           terrorism. Nine systems are operational and three systems are under\n                           development. While these data mining activities may perform various\n                           processes, we categorized and arranged our descriptions in a way that\n                           describes selected data mining processes and tools ranging from basic to\n                           advanced analytical tasks. The categories include expert systems,\n                           association processes, threat and risk assessment tools, collaboration and\n                           visualization processes, and advanced analytics.\n\nBackground\n                           While various definitions of data mining exist, we defined data mining to\n                           be the process of knowledge discovery, predictive modeling, and\n                           analytics. Traditionally, this involves the discovery of patterns and\n                           relationships from structured databases of historical occurrences.1\n                           However, data mining technology has expanded to include different\n                           processes, technologies, and methodologies.2\n\n                           Since the early 1900s, prediction has been a central goal of traditional\n                           statistics. The tools used for prediction have matured and evolved over\n                           time. For example, during the 1980s, analysts in the field of artificial\n                           intelligence took advantage of increased computing power to surpass\n                           statistical techniques by introducing new methods of prediction,\n1\n Traditional data mining infers rules or codes to predict future results via classification or segmentation processes.\n2\n Related data mining processes include name matching, and entity, event, and expression extraction from\nunstructured content such as text, images, audio and video; the clustering of observations or events; aberration or\nanomaly detection; information matching and sharing via link analysis; visualization; and, the generation of alerts to\npersonnel or other software agents.\n\n                                         Survey of DHS Data Mining Activities\n                                                       Page 4\n\x0c                           classification, and clustering using neural networks, self-organizing maps,\n                           genetic and machine learning algorithms capable of pattern recognition in\n                           extremely large databases with precise accuracy.3 Today, data mining\n                           activities have been incorporated into sophisticated analytical, modeling,\n                           and predictive systems to perform pattern recognition analysis of\n                           structured and unstructured data.4,5\n\n                           Analytics and modeling have become so pervasive and essential to private\n                           industry that they are the drivers of business intelligence for many\n                           different enterprises. Incorporating data mining tools into analytical\n                           processes provides benefits to organizations as well as their analysts, such\n                           as expanding and entering into new opportunities for business; identifying\n                           and maintaining best customer prospects; quickly adapting operations for\n                           changes in supply or demand; identifying parameters that influence trends\n                           in sales; and, optimizing business operations and performance. Although\n                           data mining does not replace the expertise that an analyst provides it\n                           automates some of the laborious tasks that an analyst performs, as well as\n                           aids in summarizing large quantities of data into meaningful information\n                           with which the analyst can work. The roots of these sophisticated\n                           intelligence systems are in traditional statistics, machine learning, Internet\n                           standards, software agents, and computational linguistics.6,7\n\n                           Some key goals of data mining are: to understand behaviors; to forecast\n                           trends and demands; to track performance; and, to transform seemingly\n                           unrelated data into meaningful information. Today, private industry and\n                           government use data mining as part of their normal course of business, as\n                           illustrated in Table 1.\n\n\n\n\n3\n  Algorithms provide step-by-step details for particular ways of implementing data mining techniques, such as\nneural networks, decision trees, self-organizing maps, Bayesian networks, and machine learning.\n4\n  Structured data refers to sources, which represent a collection of records stored in a computer in a systematic way,\nwith each record organized in a definitive schema, describing the objects that are represented in the database and the\nrelationships among them.\n5\n  Unstructured data refers to computerized information, which does not have a data structure. This may include\naudio, video and unstructured text such as e-mails or documents\n6\n  Software agent is a program that can exercise an individual\xe2\x80\x99s or organization\xe2\x80\x99s authority, work autonomously\ntoward a goal, and meet and interact with other agents.\n7\n  Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural\nlanguage from a computational perspective. Computational linguistics originated with efforts in the United States in\nthe 1950s to have computers automatically translate foreign languages into English.\n\n                                         Survey of DHS Data Mining Activities\n                                                       Page 5\n\x0c                         Table 1: Common Data Mining Uses\n                          COMMERCIAL USES\n                               \xe2\x80\xa2   To analyze and segment customer buying patterns and identify\n                                   potential goods and services that are in demand.\n                               \xe2\x80\xa2   To identify and prevent fraudulent and abusive billing practices.\n                               \xe2\x80\xa2   To analyze sales trends and predict the effectiveness of promotions.\n                               \xe2\x80\xa2   To predict the effectiveness of surgical procedures, medical tests, and\n                                   medications.\n                               \xe2\x80\xa2   To search information from a number of documents and written\n                                   sources on a particular topic (text mining).\n                               \xe2\x80\xa2   To identify trends and present statistics in ways that are easily\n                                   understood and useful.\n                          GOVERNMENT USES\n                               \xe2\x80\xa2   To monitor expenditures of employee travel and purchase cards.\n                               \xe2\x80\xa2   To quickly access information that speeds up the overall security\n                                   clearance investigation process for employees.\n                               \xe2\x80\xa2   To identify improper payments under federal benefit and loan programs\n                                   and help detect instances of fraud, waste, and abuse.\n                               \xe2\x80\xa2   To rank programs quickly by using established performance indicators.\n                               \xe2\x80\xa2   To assist law enforcement in combating terrorism.\n\n\n\n\nResults of Survey\n\n        Description of Identified Data Mining Activities\n                         The Homeland Security Act of 2002 requires DHS to use data mining tools\n                         and other advanced analytics to access, receive and analyze law\n                         enforcement and intelligence information for the purpose of identifying\n                         potential threats of terrorism within the United States.8 While serving as\n                         Assistant Attorney General of the Criminal Division of the U.S.\n                         Department of Justice (DOJ), the Secretary of DHS stated that data mining\n                         is a promising tool in thwarting terrorism, too.9 DHS is using data mining\n                         to achieve its strategic goals of awareness and prevention. Under the\n                         strategic goal of awareness, it is the Department\xe2\x80\x99s duty to identify and\n                         understand threats, assess vulnerabilities, determine potential impacts and\n                         disseminate timely information to its homeland security partners and the\n                         public.10 DHS is also committed to the prevention of terrorism by\n                         implementing the technologies and capabilities to detect and prevent\n                         terrorist attacks.11 Advances in pattern recognition, networking, and\n                         encryption technologies provide DHS a means by which it can more\n\n8\n  Homeland Security Act of 2002, 6 U.S.C. \xc2\xa7 121(d)(14) (2002).\n9\n  The Financial War on Terrorism and the Administration's Implementation of the Anti-Money Laundering\nProvisions of the USA Patriot Act: Hearing Before the Senate Comm. on Banking, Housing, and Urban Affairs,\n107th Cong. (2002) (statement of Michael Chertoff, Assistant Att\xe2\x80\x99y Gen., Criminal Div. U.S. D.O.J.).\n10\n   Securing Our Homeland: U.S. Department of Homeland Security Strategic Plan, 9 (February 24, 2004).\n11\n   Securing Our Homeland: U.S. Department of Homeland Security Strategic Plan, 16 (February 24, 2004).\n\n                                      Survey of DHS Data Mining Activities\n                                                    Page 6\n\x0c                             efficiently \xe2\x80\x9cconnect the dots\xe2\x80\x9d to combat terrorism and secure the United\n                             States.12\n\n                             We identified 12 systems and capabilities within DHS that support data\n                             mining activities. They reside within United States Customs and Border\n                             Protection (CBP), Immigration and Customs Enforcement (ICE), Office of\n                             Intelligence and Analysis (OIA), United States Secret Service (USSS), and\n                             Transportation Security Administration (TSA). Those systems and\n                             capabilities perform a variety of functions that contribute toward the\n                             counterterrorism effort. Nine systems are operational and three systems\n                             are under development.\n\n                             In the following section, we describe the 12 data mining activities\n                             identified during our review. While these activities perform various\n                             analytical processes, we grouped them to illustrate five types of analytics\n                             performed at DHS: expert systems; association processes; threat and risk\n                             assessment tools; collaboration and visualization processes; as well as\n                             advanced analytics.\n\n                             Expert Systems\n\n                             An expert system is a class of computer programs first developed by\n                             researchers in artificial intelligence during the 1970s. In essence, these\n                             programs were made up of a set of human-developed rules that analyze\n                             information about a specific class of problems, as well as provide analysis\n                             of the problem(s), and - depending upon their design - recommend a\n                             course of action for the user.\n\n                             We identified two systems in the Department that are considered expert\n                             systems: the Automated Commercial Environment Screening and\n                             Targeting Release S1 (ACE S1); and, the Freight Assessment System\n                             (FAS). Table 2 summarizes information regarding their respective\n                             purposes.\n\n\n\n\n12\n     Encryption is the process of obscuring information to make it unreadable.\n\n                                           Survey of DHS Data Mining Activities\n                                                         Page 7\n\x0c                            Table 2: Expert Systems\n                                                                  Expert Systems\n                                      A group of computer programs comprised of a set of human-developed\n                                         rules that analyze information about a specific class of problems.\n                             Data Mining Activity    Purpose                             Directorate Mission\n                             Automated Commercial    Identifies the highest risk cargo   CBP \xe2\x80\x93 Prevent terrorists and terrorist\n                             Environment Screening   shipments for further detailed      weapons from entering the United\n                             and Targeting Release   examination by agents and           States by eliminating potential\n                             S1 (ACE S1)             inspectors.                         threats before they arrive at our\n                                                                                         borders and ports.\n                             Freight Assessment      Currently under development.        TSA \xe2\x80\x93 Protect the Nation\xe2\x80\x99s\n                             System (FAS)            Pre-screens and identifies cargo    transportation systems to ensure\n                                                     that has an elevated risk,          freedom of movement for people and\n                                                     enabling agents to use efficient    commerce.\n                                                     inspection methods for cargo.\n\n\n                            CBP\xe2\x80\x99s ACE S1 release is part of a long-term plan for modernizing the\n                            screening and targeting of high-risk shipments to assist agents and\n                            inspectors at our borders. It employs an expert system, the Automated\n                            Targeting System (ATS), that uses electronic shipment data to search\n                            criteria that could indicate high-risk cargo. ACE S1 primarily uses\n                            custom-built software to perform basic data mining and includes features\n                            such as the ability to establish a centralized database to store all screening\n                            and targeting criteria and results. Future releases include plans for\n                            extracting knowledge out of unstructured data and integrating disparate\n                            data and observations and prototypes, such as geospatial event mapping.\n\n                            While ACE S1 automates the use of information to sort high-risk cargo\n                            entering the U.S. and targets it for further examination or inspection, there\n                            are some limitations to the system. For example, it uses a business rules\n                            engine to enhance screening capabilities for manifest and entry\n                            transactions.13 These business rules have values associated with an\n                            individual such as an importer, manufacturer, or broker. Based on these\n                            values, the business rules engine makes a request to take some action\n                            regarding a transaction, such as perform a document review, conduct an\n                            examination, or stop an individual. Business rules can be modified through\n                            user committees, which can reinforce or introduce new biases into the\n                            rules engine. Currently, ACE S1 does not include an automated\n                            mechanism to enable CBP to objectively measure or assess the accuracy,\n                            performance, or error rates of the rules. As a result, CBP might not be able\n                            to determine if the system does what it was originally developed to do.\n                            The system performs analysis based on limited data types, although\n                            capable of processing data from other sources.\n\n\n\n13\n Business rules engine refers to a set of rules for entering data in a database that are specific to an enterprise's\nmethods of conducting its operations.\n\n                                          Survey of DHS Data Mining Activities\n                                                        Page 8\n\x0c                            The second system we identified as using expert system technology is\n                            FAS. TSA is developing FAS to pre-screen cargo before it enters our\n                            nation\xe2\x80\x99s transportation system. FAS will identify cargo that has an\n                            elevated risk. The identified cargo will be flagged and set aside for further\n                            inspection by air carriers. To reduce the current reliance on random\n                            inspections, FAS plans on using a (human-developed) risk rules engine.\n\n                            Currently, performance of the risk model and the targeting effort are\n                            evaluated through the use of a series of statistical analysis and data quality\n                            reports. Without taking into account anomalies, TSA staff will not know if\n                            the system and the rules are performing as intended.14 TSA\xe2\x80\x99s future plans\n                            include incorporating automated analysis (machine-based rules) and using\n                            additional data sources to identify and assess high-risk cargo.\n\n                            TSA is developing FAS by using historical data on past shipments.\n                            Designers are trying to identify unique information elements for pattern\n                            recognition. A test of this approach during a pilot phase of FAS revealed\n                            the need to use a larger population for testing. Since TSA does not have a\n                            large historical database to help it develop indicators to build the rules,\n                            TSA plans to develop and incorporate predictive indicators in the future.\n\n                            One unique problem that TSA faces in using FAS is that it does not\n                            regulate shippers.15 Therefore, the information that TSA has on shippers is\n                            provided on a voluntary basis, which limits the amount and type of data\n                            that it receives.16 TSA plans to use additional sources of information,\n                            which will aid TSA by verifying the shipping company\xe2\x80\x99s identity and\n                            legitimacy. Additionally, vetting shippers through additional sources, such\n                            as using information from reports of shipper violations, will help TSA\n                            identify known shippers.\n\n                            Association Processes\n\n                            Association refers to the process of discovering two or more variables that\n                            are related. Association, however, does not imply a direct causal\n                            connection between the associated variables. This process operates across\n                            a variety of platforms and can quickly search through vast sources of data\n                            to identify co-existence. It employs algorithms to perform link analysis\n\n14\n   Anomaly detection compares a profile of allowed or expected attributes against a population, with any deviation\nfrom that profile flagged as a potential risk.\n15\n   Known shippers are entities that have routine business dealings with freight forwarders or air carriers and are\nconsidered trusted shippers. In contrast, unknown shippers are entities that have conducted limited or no prior\nbusiness with a freight forwarder or air carrier.\n16\n   According to a TSA official, TSA has issued a Final Rule that will require all air carriers and indirect air carriers\nto provide all known shipper information to the TSA-managed Known Shipper Management System by December\n1, 2006.\n\n                                          Survey of DHS Data Mining Activities\n                                                        Page 9\n\x0cand to uncover associations that are normally difficult to detect. For\nexample, association processes can show that persons A, B, and C are at\nthe same location at the same time. Link analysis is used to uncover,\ninterpret, and display relationships between persons, places, and events in\na visual format.\n\nWe identified four data mining activities that use association processes to\nperform analysis. These are Data Analysis and Research for Trade\nTransparency System (DARTTS), Immigration and Customs Enforcement\nPattern Analysis and Information Collection System (ICEPIC), Law\nEnforcement Analysis Data System (NETLEADS), and Crew Vetting\nSystem (CVS). Table 3 summarizes information regarding their respective\npurposes.\n\nTable 3: Association Processes\n                                Association Processes\n              The process of discovering two or more variables that are related.\nData Mining Activity     Purpose                             Directorate Mission\nData Analysis and        Assists agents in identifying and\nResearch for Trade       detecting money laundering, drug\nTransparency System      trafficking, and other illegal\n(DARTTS)                 activities through financial\n                         transactions.                     ICE \xe2\x80\x93 Protect the United States and\n                                                           uphold public safety by identifying\n                                                           criminal activities and eliminating\nImmigration and          Enables investigators to conduct vulnerabilities that pose a threat to\nCustoms Enforcement      targeted checks of non-resident   our nation\xe2\x80\x99s border, as well as\nPattern Analysis and     aliens and provides leads for the economic, transportation, and\nInformation Collection   disruption of potential terrorist infrastructure security.\nSystem (ICEPIC)          activities.\nLaw Enforcement          Supports agents in identifying\nAnalysis Data System     criminal activity patterns and\n(NETLEADS)               trends and associations among\n                         criminal organizations.\nCrew Vetting System      Assists analysts in screening air   TSA \xe2\x80\x93 Protect the Nation\xe2\x80\x99s\n(CVS)                    carrier personnel to ensure the     transportation systems to ensure\n                         security of air transportation.     freedom of movement for people and\n                                                             commerce.\n\n\nDARTTS is a legacy system of the former U.S. Customs Service. It was\ndeveloped to assist agents in identifying and detecting money laundering,\ndrug trafficking, and other illegal activities. This system now resides under\nICE. This small-scale, stand-alone system uses commercial off-the-shelf\nsoftware to aid in the analysis of structured data found in databases.\nSystem owners from the ICE Financial and Trade Investigation\xe2\x80\x99s Trade\nTransparency Unit are collaborating with the ICE Office of the Chief\nInformation Officer and Information Systems Security Manager to provide\nDARTTS to users in a web environment.\n\nDARTTS allows for data mining and analysis that is not available in other\nsystems. For instance, this system allows the user to produce aggregate\n\n              Survey of DHS Data Mining Activities\n                           Page 10\n\x0c                           totals for importation of currency, and then sort based on any number of\n                           variables, such as country of origin, party name, or total currency value.\n                           The ICE Financial and Trade Investigations Division\xe2\x80\x99s Trade\n                           Transparency Unit also uses DARTTS as its platform for sharing and\n                           analyzing U.S. and foreign trade data, pursuant to Customs Mutual\n                           Assistance agreements. This allows the user to see the \xe2\x80\x9cbigger picture\xe2\x80\x9d--\n                           permitting investigators to identify anomalies that can be indicative of\n                           trade-based money laundering or other import-export crimes.\n\n                           The second association process, ICEPIC, was developed to assist\n                           investigators in meeting the goal of disrupting and preventing terrorism\n                           activities. ICEPIC is a small-scale system that employs a variety of\n                           commercial off-the-shelf software and government off-the-shelf software\n                           to support criminal investigators. ICEPIC employs matching to integrate\n                           and confirm information from structured data sources in DHS databases. It\n                           uses associations for discovery of patterns and relationships. The system\n                           connects to ICE\xe2\x80\x99s network to aid investigators in generating leads,\n                           conducting batch queries of names, and reporting.\n\n                           ICE uses a third system, NETLEADS, in the area of criminal\n                           investigations. It is a web-enabled intelligence and investigations analysis\n                           database repository. NETLEADS tools give users the capability to rapidly\n                           search and conduct analysis. It is designed to support agents in identifying\n                           criminal activity patterns, trends and associations among criminal\n                           organizations. It uses commercial off-the-shelf products to discover\n                           relationships within data.\n\n                           The system consists of fifty million indexed names, intelligence, subject\n                           records, investigation reports, and global intelligence information on\n                           topics such as smuggling, terrorism, and transnational trends. NETLEADS\n                           provides an integrated common interface for querying intelligence and\n                           enforcement applications and data.\n\n                           NETLEADS includes hosted data marts of intelligence and investigative\n                           information.17 The system was built based on proprietary design and uses\n                           associations and name matching, and queries data. Investigators and\n                           analysts use it for identifying connections between persons of interest or\n                           persons under investigation. The users can access multiple government\n                           and commercial databases to discover patterns and relationships. Benefits\n                           of this system include providing the capability to search multiple\n\n17\n  A data mart is a specialized version of a data warehouse. Like data warehouses, data marts contain a snapshot of\noperational data that helps people to strategize based on analyses of past trends and experiences. In contrast to a data\nwarehouse, the creation of a data mart is predicated on a specific, predefined need for a certain grouping and\nconfiguration of selected data.\n\n                                         Survey of DHS Data Mining Activities\n                                                      Page 11\n\x0cdatabases and assemble information into a common analytical\nenvironment.\n\nThe last system we identified as using an association process is CVS. TSA\nuses this system to screen commercial air carrier personnel to help ensure\nthe security of air transportation. CVS encompasses the Flight Crew\nManifest and Master Crew List vetting. These two sub-programs ensure\nthat 100% of the crew members of commercial air carriers flying into, out\nof, over, and through the U.S. are vetted on a recurring basis. The Flight\nCrew Manifest process ensures that the actual people on the aircraft are\nvetted each time they fly. The Master Crew List population is subject to\nperpetual vetting, therefore whenever there is any change in data or\nplacement on or removal from a watch list, the system triggers a\nnotification that new data has resulted in a match, which is then referred\nfor appropriate action.\n\nCVS uses proprietary software to perform data element matching of\nnames, social security numbers, passports, and other pertinent information.\nIt also uses commercial-off-the shelf products for error checking and\nmessage processing and logging.\n\nThreat and Risk Assessment Tools\n\nRisk management is the process of identifying risk, assessing risk, and\ntaking steps to reduce risk to an acceptable level. Threat and risk\nassessments are widely recognized as decision support tools to establish\nand prioritize security program requirements. A risk-based approach\nallows organizations to make better judgments about where to deploy\nresources and where to prioritize protection efforts.\n\nWe identified one tool, the Risk Management Reporting System (RMRS),\nwhich uses a threat and risk-based approach to analyze results and conduct\ndata mining activities. RMRS generates risk-assessed scores for assets\nbased on advanced analytics. Table 4 summarizes information regarding\nits purpose.\n\nTable 4: Threat and Risk Assessment Tools\n                        Threat and Risk Assessment Tools\n   Decision support tools that enable the deliberate, analytical approach to identify which\n           threats can exploit vulnerabilities in an organization\xe2\x80\x99s specific assets.\nData Mining Activity     Purpose                              Directorate Mission\nRisk Management          Collects information and             TSA \xe2\x80\x93 Protect the Nation\xe2\x80\x99s\nReporting System         generates a score based on level     transportation systems to ensure\n(RMRS)                   of risk for assets comprising our    freedom of movement for people and\n                         Nation\xe2\x80\x99s critical infrastructures.   commerce.\n\n\n\n             Survey of DHS Data Mining Activities\n                          Page 12\n\x0cRMRS is a stand-alone system that TSA uses to store and analyze risk.\nBecause RMRS is flexible and modular, it has the capability to store and\nanalyze risk for assets including, but not limited to, maritime facilities and\nvessels, airports, mass transit, and public assembly facilities such as\nstadiums and arenas. RMRS generates a score based on the level of risk\nassociated with a particular asset. The information that RMRS has in its\ndatabase is reported by facility managers, security personnel, and law\nenforcement agents and entered into one of the available tools associated\nwith RMRS, such as TSA\xe2\x80\x99s Vulnerability Identification Self-Assessment\nTool (VISAT).\n\nVISAT is a voluntary, on-line assessment tool that helps transportation\nasset owners and operators enhance the security for assets including\nmaritime vessels, heavy railways, subways, rail stations, and highway\nbridges. The goal of VISAT is to raise the level of security awareness in\npublic assembly facilities nationwide and to establish a common baseline\nof security awareness from which these facilities can build their protection\nplans. In addition to VISAT, RMRS captures asset information from tools\nthat TSA field inspectors use when conducting independent inspections.\nRMRS processes the information captured from these tools using\nalgorithms to generate a level of impact score to assess the likelihood of a\nterrorism attack, attractiveness of the target, and the consequences of an\nact of terrorism. RMRS also captures and analyzes criticality information,\nincluding potential life-threatening, economic, and psychological impacts\nfrom threat scenarios when applied to particular assets.\n\nCollaboration and Visualization Processes\n\nCollaboration and visualization processes assist agents, analysts, and\ninvestigators in collecting and analyzing information. Collaboration is the\nstrategic management process of collecting, tagging, classifying,\norganizing, and applying an organization\xe2\x80\x99s internal content and expertise.\nVisualization processes aid in analyzing data sets by defining views,\nhighlighting findings, navigating on specific features to find trends, and\npinpointing exceptions. Data visualization simplifies the presentation of\ninformation while maintaining the integrity and depth of the information.\nThese technologies can also easily deal with very large and highly non-\nhomogeneous amounts of data.\n\nWe identified four processes that primarily use collaboration or\nvisualization to perform data mining activities. These are Intelligence and\nInformation Fusion (I2F), Numerical Integrated Processing System\n(NIPS), Questioned Identification Documents (QID), and Tactical\n\n\n            Survey of DHS Data Mining Activities\n                         Page 13\n\x0cInformation Sharing System (TISS). Table 5 summarizes information\nregarding their purposes in relation to the respective missions.\n\nTable 5: Collaboration and Visualization Processes\n                          Collaboration and Visualization Processes\n             The strategic management process of collecting, tagging, classifying,\n           organizing, and applying an organization\xe2\x80\x99s internal content and expertise\n            Presents the data sets by defining views, highlighting findings, flagging\n                               exceptions, etc in large data sets.\nData Mining\n                        Purpose                             Directorate or Critical Agency Mission\nActivity\nIntelligence and        Currently under development.         OIA \xe2\x80\x93 Gather, analyze, and disseminate\nInformation Fusion      Will provide information analysts information in a mission-oriented manner.\n(I2F)                   with state-of-the-art analysis tools\n                        that aid in the discovery and\n                        tracking of terrorism threats to the\n                        U.S. population and\n                        infrastructure.\nNumerical Integrated Assists agents in identifying          ICE \xe2\x80\x93 Protect the United States and\nProcessing System    anomalies indicative of criminal       uphold public safety by identifying\n(NIPS)               activity, such as immigration          criminal activities and eliminating\n                     violations, customs fraud, export      vulnerabilities that pose a threat to our\n                     violations, drug smuggling, and        nation\xe2\x80\x99s border, as well as economic,\n                     terrorism.                             transportation, and infrastructure\n                                                            security.\nQuestioned              Allows analysts to compare          USSS \xe2\x80\x93 Protect key individuals and\nIdentification          questionable documents against      investigate crimes related to\nDocuments               genuine documents such as           counterfeiting and financial sector,\n(QID)                   passports and drivers licenses.     including identity theft, computer fraud,\n                                                            and cyber attacks.\n\nTactical Information    Captures observations of            TSA \xe2\x80\x93 Protect the Nation\xe2\x80\x99s transportation\nSharing System          suspicious activities in aviation   systems to ensure freedom of movement\n(TISS)                  and provides law enforcement        for people and commerce.\n                        officials with information for\n                        examining long-term trends and\n                        patterns.\n\n\nThe purpose of the I2F is to make operational an integrated intelligence\nand information capability for DHS. This capability will enable\nintelligence analysts to understand relationships that would otherwise not\nbe readily apparent. I2F is in early development and is primarily\ndependent on the analyst manually processing, compiling, and analyzing\ndata. The next version of the system will be a set of tools and technologies\nintegrated to support the intelligence analyst.\n\nI2F provides intelligence analysts with tools that aid in the discovery and\ntracking of terrorism threats to the United States population and\ninfrastructure. I2F is principally made up of commercial off-the-shelf\nsoftware, but also integrates government off-the-shelf programs. These\n\n\n\n\n                 Survey of DHS Data Mining Activities\n                              Page 14\n\x0c                          programs are used for entity extraction, search capabilities, and link\n                          analysis.18\n\n                          The second system, NIPS, is a web-based strategic analytical tool used by\n                          DHS agents and analysts to manipulate, compare, and analyze large data\n                          sets of commercial, passenger, financial, and enforcement data.\n                          Stakeholders in this project include the ICE Intelligence Division, ICE\n                          Field Intelligence Units, ICE Office of Investigations, CBP Intelligence\n                          Division, CBP Office of Field Operations, CBP National Targeting\n                          Center, Container Security Initiative, and Customs Trade Partnership\n                          against Terrorism.\n\n                          NIPS enables users to identify anomalies that are indicative of possible\n                          criminal activity, including illicit actions in support of terrorism, money\n                          laundering, tax evasion, weapons proliferation, immigration violations,\n                          and drug smuggling. Using an On-Line Analytical Processing (OLAP)\n                          tool, NIPS leverages a manual approach to quickly respond to ad hoc\n                          queries across multiple databases.19\n\n                          Functional capabilities include link analysis, rule based intelligence,\n                          power search, summary reports, and geospatial integration. According to a\n                          senior ICE official, NIPS greatly enhances the overall capability of DHS\n                          to identify, target, and disrupt potential acts of terrorism.\n\n                          The third system that uses collaboration is QID. QID assists analysts in\n                          evaluating whether a document is counterfeit. Security personnel at\n                          airports, seaports, and borders use QID as a validation tool to view\n                          samples of internationally-issued identity documents. Specifically, it\n                          allows analysts to compare questionable documents against genuine\n                          documents, such as passports and drivers licenses. QID has Intranet\n                          connectivity as well as the capability to identify associations and patterns\n                          to determine if documents are genuine. This process provides suggestions\n                          to the analyst on what to look for in making a determination about the\n                          authenticity of a questioned document.\n\n                          The fourth system that uses collaboration is TISS. This system facilitates\n                          the collection of tactical information related to suspicious activity. It uses\n                          a combination of technologies and domain awareness, so there is greater\n                          capability to observe and report suspicious activities that may indicate\n\n18\n   Extraction is a technique for reducing the number of attributes used in the processes of classification,\nsegmentation, and pattern recognition. Within a document, entity extraction can also be customized to recognize\npattern-based, list-based entities, events, and relationships.\n19\n   OLAP enables users to analyze data across multiple dimensions, usually reserved to key business metrics such as\nproducts, departments, regions, and time segments.\n\n                                        Survey of DHS Data Mining Activities\n                                                     Page 15\n\x0c                           terrorist planning. TISS captures Federal Air Marshal Service observations\n                           of suspicious activity in aviation. This system enables Federal Air\n                           Marshals and other law enforcement officers in the field to report these\n                           observations into the database for analysis using the TISS Analytic Tool.\n\n                           In addition, TISS allows Federal Air Marshals to prepare and immediately\n                           submit reports of suspicious activity to the Federal Air Marshal Service\n                           Investigations Division. For example, an individual taking pictures of a\n                           building or videotaping the operations of a maritime port may raise\n                           suspicions compared to others who normally work or visit such locations.\n                           This suspicious activity can be noted by security or law enforcement\n                           officers and immediately entered into the system. As another example, if\n                           an individual attempted to pass through airport security screening with a\n                           handgun at two separate airports and on different occasions, the\n                           surveillance reports that were entered into TISS could be used by the\n                           Federal Air Marshal Service to establish a link between the incidents.\n\n                           Advanced Analytics\n\n                           Science and Technology (S&T) is developing an advanced analytics\n                           capability called Analysis, Dissemination, Visualization, Insight and\n                           Semantic Enhancement (ADVISE), as described in Table 6. ADVISE is an\n                           advanced information technology that can integrate information and facts\n                           from many different types of data. Since ADVISE is a \xe2\x80\x9ctechnology\n                           framework,\xe2\x80\x9d it can be tailored and deployed for specific purposes and\n                           areas of interest. For example, it is being developed to incorporate\n                           chemical, biological, radiological, nuclear, and explosive threat and effects\n                           data. It is intended to ingest data from a variety of sources, ranging from\n                           highly structured content, such as database records, to unstructured\n                           content, such as message traffic. Still in development, ADVISE will\n                           connect information extracted from text and images, databases, and\n                           simulation and modeling tools to provide a watch-and-warning system for\n                           analysts.\n\n                           ADVISE employs semantic graphs to determine relationships and patterns\n                           among data and multiple visualization techniques to display the resulting\n                           information.20 The Department seeks to predict threat and vulnerabilities,\n                           such as through the detection of relationships between seemingly\n                           disjointed entities. Semantic graphs organize data entities regarding threats\n                           and vulnerabilities and link their relationships. Thus, hidden relationships\n                           in the data are uncovered by examining the structure and properties of the\n\n20\n   A semantic graph is a network of heterogeneous nodes (a point at which two lines or systems meet or cross) and\nlinks. Because these graphs are ideal for representing relationship and linkage information, they have emerged as a\nkey technology for organizing DHS data.\n\n                                        Survey of DHS Data Mining Activities\n                                                     Page 16\n\x0c            semantic graph. For example, a simple semantic graph can link people,\n            workplaces, and towns as well as indicate a relationship with various\n            friends. Studying the links can assist in understanding the relationships\n            between entities, and help identify threats and vulnerabilities. S&T\n            expects ADVISE\xe2\x80\x99s ability to apply the capabilities of semantic data\n            fusion, link analysis, and unstructured text analysis will be a powerful\n            capability that will allow analysts to find the expected and discover the\n            unexpected.\n\n            Table 6: Advanced Analytics\n                                                 Advanced Analytics\n                     Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement\n            Data Mining Activity       Purpose                               Component Mission\n            Analysis, Dissemination,   Currently under development.          S&T- Protect the homeland by\n            Visualization, Insight,    Integrates the various information    providing Federal and local officials\n            and Semantic               analysis and synthesis,               with state-of-the-art technology and\n            Enhancement (ADVISE)       visualization, and knowledge          resources.\n                                       discovery component capabilities.\n                                       Will incorporate comprehensive\n                                       chemical, biological, radiological,\n                                       nuclear, and explosive threat and\n                                       effects data.\n\n\n\n\nManagement Comments\n            We provided a draft version of this report to DHS\xe2\x80\x99 OCIO and requested\n            written comments from this office. In response, the OCIO indicated that it\n            had no comments on the draft report.\n\n\n\n\n                          Survey of DHS Data Mining Activities\n                                       Page 17\n\x0cAppendix A\nPurpose, Scope, and Methodology\n\n\n\n\nOur objective was to identify and describe DHS\xe2\x80\x99 data-mining activities.\nWe researched various definitions and prepared a definition of data mining\nthat focused on applications and tools used for knowledge discovery and\nanalytics. We shared this definition with interviewees during our survey.\n\nTo obtain a selection of data mining activities for review, we extracted an\ninitial list of systems using key word searches on the DHS\xe2\x80\x99 Trusted Agent\nFISMA database. We refined the list by removing systems relating to\nadministrative functions or efforts that did not contribute toward DHS\ncounterterrorism efforts. We sent components lists to verify whether the\nspecified systems performed data mining activities and to add applicable\nsystems that were not on the list. Officials from FEMA, FLETC, and US\nVISIT responded that they do not conduct any data mining activities.\n\nWe interviewed officials from the OCIO and the Privacy Office. We\nconducted interviews with CIOs, program managers, and officials\nrepresenting CBP, CIS, FEMA, ICE, OIA, S&T, TSA, and USSS. The\npurposes of our interviews were to clarify the objective of the system,\ndescribe data mining activities, and obtain documentation. We collected\nand reviewed technical information and documentation through a data call\nto database administrators regarding the system, including tools and\ntechniques used to conduct data mining activities, too.\n\nWe conducted fieldwork in the Washington, DC metropolitan area. Our\nanalysis is based upon direct observation, review of applicable\ndocumentation, and interviews. We conducted our survey from November\n2005 through April 2006 under the authority of the Inspector General Act\nof 1978, as amended, and according to generally accepted government\nauditing standards.\n\nThe principal OIG points of contact for the survey are Frank Deffer,\nAssistant Inspector General for Information Technology Audit (202) 254-\n4100 and Marj Leaming, Director Special Projects Audit Division (202)\n254-4172. Major OIG contributors to the survey are identified in\nAppendix B.\n\n\n\n\n            Survey of DHS Data Mining Activities\n                         Page 18\n\x0cAppendix B\nMajor Contributors to this Report\n\n\n\n\nSpecial Projects Division\n\nMarj Leaming, Director\nBarbara Ferris, Audit Manager\nAudilia Wedderburn, Auditor\nJuliana Meek, Student Temporary Employment Program\nScott Binder, IT Auditor\n\nTechnical Consultants\n\nJesus Mena, Data Mining Consultant\nMichael Pridgen, Database Consultant\nRichard Streeter, Database Consultant\n\nAdvanced Technology Division\n\nMichael Goodman, Security Engineer\n\n\n\n\n             Survey of DHS Data Mining Activities\n                          Page 19\n\x0cAppendix C\nReport Distribution\n\n\n\n\nDepartment of Homeland Security\n\nSecretary\nDeputy Secretary\nChief of Staff\nDeputy Chief of Staff\nGeneral Counsel\nExecutive Secretary\nAssistant Secretary for Policy\nAssistant Secretary for Public\nDHS Legislative and Intergovernmental Affairs\nDHS GAO OIG Audit Liaison\nDirector, Operations Directorate\nChief Privacy Officer\nChief Information Officer\nComponent Chief Information Officers\n\nOffice of Management and Budget\n\nChief, Homeland Security Branch\nDHS OIG Budget Examiner\n\nCongress\n\nCongressional Oversight and Appropriations Committees, as appropriate\n\n\n\n\n             Survey of DHS Data Mining Activities\n                          Page 20\n\x0c                Appendix C\n                Report Distribution\n\n\n\n\nAdditional Information and Copies\n\nTo obtain additional copies of this report, call the Office of Inspector General\n(OIG) at (202) 254-4100, fax your request to (202) 254-4285, or visit the OIG\nweb site at www.dhs.gov/oig.\n\nOIG Hotline\n\nTo report alleged fraud, waste, abuse or mismanagement, or any other kind\nof criminal or noncriminal misconduct relative to department programs or\noperations, call the OIG Hotline at 1-800-323-8603; write to DHS Office of\nInspector General/MAIL STOP 2600, Attention: Office of Investigations\xe2\x80\x93\nHotline, 245 Murray Drive, SW, Building 410, Washington, DC 20528; fax\nthe complaint to (202) 254-4292; or email DHSOIGHOTLINE@dhs.gov. The\nOIG seeks to protect the identity of each writer and caller.\n\n\n\n\n                             Survey of DHS Data Mining Activities\n                                          Page 21\n\x0c"